{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Contextual Compression in Document Retrieval\n",
    "\n",
    "## Overview\n",
    "\n",
    "This code demonstrates the implementation of contextual compression in a document retrieval system using LangChain and OpenAI's language models. The technique aims to improve the relevance and conciseness of retrieved information by compressing and extracting the most pertinent parts of documents in the context of a given query.\n",
    "\n",
    "## Motivation\n",
    "\n",
    "Traditional document retrieval systems often return entire chunks or documents, which may contain irrelevant information. Contextual compression addresses this by intelligently extracting and compressing only the most relevant parts of retrieved documents, leading to more focused and efficient information retrieval.\n",
    "\n",
    "## Key Components\n",
    "\n",
    "1. Vector store creation from a PDF document\n",
    "2. Base retriever setup\n",
    "3. LLM-based contextual compressor\n",
    "4. Contextual compression retriever\n",
    "5. Question-answering chain integrating the compressed retriever\n",
    "\n",
    "## Method Details\n",
    "\n",
    "### Document Preprocessing and Vector Store Creation\n",
    "\n",
    "1. The PDF is processed and encoded into a vector store using a custom `encode_pdf` function.\n",
    "\n",
    "### Retriever and Compressor Setup\n",
    "\n",
    "1. A base retriever is created from the vector store.\n",
    "2. An LLM-based contextual compressor (LLMChainExtractor) is initialized using OpenAI's GPT-4 model.\n",
    "\n",
    "### Contextual Compression Retriever\n",
    "\n",
    "1. The base retriever and compressor are combined into a ContextualCompressionRetriever.\n",
    "2. This retriever first fetches documents using the base retriever, then applies the compressor to extract the most relevant information.\n",
    "\n",
    "### Question-Answering Chain\n",
    "\n",
    "1. A RetrievalQA chain is created, integrating the compression retriever.\n",
    "2. This chain uses the compressed and extracted information to generate answers to queries.\n",
    "\n",
    "## Benefits of this Approach\n",
    "\n",
    "1. Improved relevance: The system returns only the most pertinent information to the query.\n",
    "2. Increased efficiency: By compressing and extracting relevant parts, it reduces the amount of text the LLM needs to process.\n",
    "3. Enhanced context understanding: The LLM-based compressor can understand the context of the query and extract information accordingly.\n",
    "4. Flexibility: The system can be easily adapted to different types of documents and queries.\n",
    "\n",
    "## Conclusion\n",
    "\n",
    "Contextual compression in document retrieval offers a powerful way to enhance the quality and efficiency of information retrieval systems. By intelligently extracting and compressing relevant information, it provides more focused and context-aware responses to queries. This approach has potential applications in various fields requiring efficient and accurate information retrieval from large document collections."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align: center;\">\n",
    "\n",
    "<img src=\"../images/contextual_compression.svg\" alt=\"contextual compression\" style=\"width:70%; height:auto;\">\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "from dotenv import load_dotenv\n",
    "from langchain.retrievers.document_compressors import LLMChainExtractor\n",
    "from langchain.retrievers import ContextualCompressionRetriever\n",
    "from langchain.chains import RetrievalQA\n",
    "\n",
    "\n",
    "sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..'))) # Add the parent directory to the path sicnce we work with notebooks\n",
    "from helper_functions import *\n",
    "from evaluation.evalute_rag import *\n",
    "\n",
    "# Load environment variables from a .env file\n",
    "load_dotenv()\n",
    "\n",
    "# Set the OpenAI API key environment variable\n",
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv('OPENAI_API_KEY')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define document's path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = \"../data/Understanding_Climate_Change.pdf\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = encode_pdf(path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a retriever + contexual compressor + combine them "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a retriever\n",
    "retriever = vector_store.as_retriever()\n",
    "\n",
    "\n",
    "#Create a contextual compressor\n",
    "llm = ChatOpenAI(temperature=0, model_name=\"gpt-4o-mini\", max_tokens=4000)\n",
    "compressor = LLMChainExtractor.from_llm(llm)\n",
    "\n",
    "#Combine the retriever with the compressor\n",
    "compression_retriever = ContextualCompressionRetriever(\n",
    "    base_compressor=compressor,\n",
    "    base_retriever=retriever\n",
    ")\n",
    "\n",
    "# Create a QA chain with the compressed retriever\n",
    "qa_chain = RetrievalQA.from_chain_type(\n",
    "    llm=llm,\n",
    "    retriever=compression_retriever,\n",
    "    return_source_documents=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The main topic of the document is climate change, focusing on international collaboration, national strategies, policy development, and the ethical dimensions of climate justice. It discusses frameworks like the UNFCCC and the Paris Agreement, as well as the importance of sustainable practices for future generations.\n",
      "Source documents: [Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 9}, page_content='Chapter 6: Global and Local Climate Action  \\nInternational Collaboration  \\nUnited Nations Framework Convention on Climate Change (UNFCCC)  \\nThe UNFCCC is an international treaty aimed at addressing climate change. It provides a \\nframework for negotiating specific protocols and agreements, such as the Kyoto Protocol and \\nthe Paris Agreement. Global cooperation under the UNFCCC is crucial for coordinated \\nclimate action.  \\nParis Agreement  \\nThe Paris Agreement, adopted in 2015, aims to limit global warming to well below 2 degrees \\nCelsius above pre-industrial levels, with efforts to limit the increase to 1.5 degrees Celsius. \\nCountries submit nationally determined contributions (NDCs) outlining their climate action \\nplans and targets.  \\nNational Strategies  \\nCarbon Pricing  \\nCarbon pricing mechanisms, such as carbon taxes and cap-and-trade systems, incentivize \\nemission reductions by assigning a cost to carbon emissions. These policies encourage'), Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 27}, page_content='Legacy for Future Generations  \\nOur actions today shape the world for future generations. Ensuring a sustainable and resilient \\nplanet is our responsibility to future generations. By working together, we can create a legacy \\nof environmental stewardship, social equity, and global solidarity.  \\nChapter 19: Climate Change and Policy  \\nPolicy Development and Implementation  \\nNational Climate Policies  \\nCountries around the world are developing and implementing national climate policies to \\naddress climate change. These policies set emission reduction targets, promote renewable \\nenergy, and support adaptation measures. Effective policy implementation requires'), Document(metadata={'source': '../data/Understanding_Climate_Change.pdf', 'page': 18}, page_content='This vision includes a healthy planet, thriving ecosystems, and equitable societies. Working together towards this vision creates a sense of purpose and motivation . By embracing these principles and taking concerted action, we can address the urgent challenge of climate change and build a sustainable, resilient, and equitable world for all. The path forward requires courage, commitment, and collaboration, but the rewa rds are immense—a thriving planet and a prosperous future for generations to come.  \\nChapter 13: Climate Change and Social Justice  \\nClimate Justice  \\nUnderstanding Climate Justice  \\nClimate justice emphasizes the ethical dimensions of climate change, recognizing that its impacts are not evenly distributed. Vulnerable populations, including low -income communities, indigenous peoples, and marginalized groups, often face the greatest ris ks while contributing the least to greenhouse gas emissions. Climate justice advocates for')]\n"
     ]
    }
   ],
   "source": [
    "query = \"What is the main topic of the document?\"\n",
    "result = qa_chain.invoke({\"query\": query})\n",
    "print(result[\"result\"])\n",
    "print(\"Source documents:\", result[\"source_documents\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
